UNITED STATES PATENT APPLICATION FOR 



PROVIDING INSTRUCTION EXECUTION 
HINTS TO A PROCESSOR USING 
BREAK INSTRUCTIONS 



Inventors : 
Alan H. Karp 
Rajiv Gupta 



CERTIFICATE OF MAILING BY "EXPRESS MAIL" 
UNDER 37 C.F.R. § 1.10 

"Express Mail" mailing label number : £ \ 1 K 6$ ) ^ 
Date of Mailing: 9 >- "2. -t - T-OO \ 

I hereby certify that this correspondence is 
being deposited with the United States Postal 
Service, utilizing the "Express Mail Post Office to 
Addressee" service addressed to Assistant 
Commissioner for Patents, Washington, D.C. 20231 and 
mailed on the above Date of Mailing with the above 
"Express Mail" mailing label number. 



(J i\/C^ 



Paul H. Horstmann, Reg. No. 36, 167 
Signature Date: C fr- 2 *j - 'ZOQ \ 



- 1 - 



BACKGROUND OF THE INVENTION 

Field of Invention 

The present invention pertains to the field of 
5 computer systems. More particularly, this invention 

relates to providing instruction execution hints to a 
processor . 

Art Background 

10 A computer system usually includes one or more 

processors which execute instructions. A processor 
may also be referred to as a central processing unit. 
A processor typically conforms to a macro- 
architecture which specifies an instruction set and a 

15 set of architectural registers, etc for code executed 
by the processor. 

The code executed by a processor is usually 
referred to as object code. Typically, the object 

20 code executed by a processor is generated by a 

compiler. It is usually desirable to implement a 
compiler so that it generates object code in a manner 
that will enhance the speed at which the object code 
is executed by a processor. For example, it is 

25 common for a compiler to generate object code for a 

processor based on micro-architecture features of the 
processor such as on-chip caches, out-order 
processing capabilities, branch prediction 
capabilities, etc. 

30 

It is common for processor manufacturers to 
provide a family of processors that conform to a 
given macro-architecture. Processors in a family 
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usually vary according to micro-architecture features 
such as on-chip caches, out-order processing 
capabilities, branch prediction capabilities, etc. 

5 Unfortunately, object code which is compiled 

based on the micro-architecture features of one 
member of a processor family may suffer in 
performance when executed on another member of the 
family. For example, object code that includes pre- 
10 fetch instructions which are adapted to a processor 
having particular size of on-chip cache may hinder 
the performance of a processor having a smaller or 
non-existent on-chip cache. 

15 Some prior systems use a re-compiler to 

translate object code which is optimized for one 
member of a processor family to another member of the 
processor family. Unfortunately, such object code 
translations usually alter object code sequences 

20 which can cause errors. 
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SUMMARY OF THE INVENTION 

A computer system is disclosed with mechanisms 
for providing hint instructions to a processor 
5 without altering object code instruction sequences. 

A computer system according to the present teachings 
includes elements for generating a hint instruction 
in response to a set of object code to be executed by 
the processor and for inserting a break instruction 
10 into the object code such that the break instruction 

causes the processor to obtain and execute the hint 
instruction. The present techniques for providing 
hint instructions to a processor may be used to adapt 
object code to a micro-architecture of the processor. 

15 

Other features and advantages of the present 
invention will be apparent from the detailed 
description that follows. 



Attorney Docket No. 10980982 



BRIEF DESCRIPTION OF THE DRAWINGS 



The present invention is described with respect 
to particular exemplary embodiments thereof and 
reference is accordingly made to the drawings in 
which : 

Figure 1 illustrates a computer system which 
provides hint instructions to a processor according 
to the present teachings; 

Figure 2 shows another computer system which 
provides hint instructions to a processor according 
to the present teachings; 

Figure 3 shows the handling of a break 
instruction by a processor according to the present 
teachings ; 

Figure 4 shows a method for adapting an 
instruction stream to a micro-architecture of a 
processor according to the present teachings ; 

Figure 5 shows an example micro-architecture fo 
a processor; 

Figure 6 shows an instruction pipeline one 
processor cycle after a break instruction is 
detected. 
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DETAILED DESCRIPTION 

Figure 1 illustrates a computer system 200 which 
provides hint instructions to a processor 10 
5 according to the present teachings. The computer 

system 200 includes an object code adapter 14 which 
provides hint instructions to the processor 10 using 
a mechanism for handling break instructions which is 
built in to the processor 10. In one embodiment, the 
10 object code adapter 14 uses the present techniques to 
adapt a set of object code 60 to a micro-architecture 
of the processor 10. 

The object code 60 includes a sequence of 
15 instructions I ± through I n in object code according to 
the macro-architecture of the processor 10. The 
macro-architecture of the processor 10 defines an 
instruction set and a set of architectural registers 
and an address space, etc. for the processor 10. The 
20 micro-architecture of the processor 10 defines a set 
of capabilities and/or characteristics implemented in 
the processor 10 such as branch prediction 
capability, on-chip cache, instruction pipeline 
length, etc. 

25 

The object code adapter 14 adapts the object 
code 60 by providing hint instructions to the 
processor 10 in response to the micro-architecture of 
the processor 10 and the instructions I 1 through I n 
30 contained in the object code 60. In one embodiment, 
the object code adapter 14 generates a set of object 
code 62 and a set of hint code 64 in response to the 
object code 60. 
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The object code adapter 14 generates the object 
code 62 by inserting break instructions in place of 
selected instructions in the object code 60. For 
example, the object code adapter 14 replaces the 
instruction I 3 with a break instruction B 1 . 

The hint code 64 is code to be executed by the 
processor 10 when the break instruction B 1 is 
executed. The hint code 64 includes a hint 
instruction E 1 and the instruction I 3 that was 
replaced by the break instruction E 1 and may include 
additional instructions including additional hint 
instructions depending on the type of adaptation 
and/or optimization performed by the object code 
adapter 14. The hint code 64 may include a branch or 
return instruction to resume execution of the object 
code 62 depending on the implementation of the break 
mechanism in the processor 10. 

One example of a hint instruction is a pre-fetch 
instruction that includes a pre-fetch address. The 
processor 10 executes a pre-fetch instruction by 
fetching a set of data from a memory using the pre- 
fetch address and writing the data into a cache 
associated with the processor 10. The cache may be 
separate from the processor 10 or may be integrated 
into the processor 10 as an on-chip cache. 

Another example of a hint instruction is a 
branch prediction that specifies a likely result of a 
branch instruction in the sequence of instructions I 1 
through I n . 
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The hint instructions provided to the processor 
10 using the present techniques may be adapted to the 
micro-architecture of the processor 10 to speed 
instruction execution. For example, if the micro- 
5 architecture of the processor 10 includes a 

relatively large on-chip cache then the object code 
adapter 14 may provide a relatively large number of 
pre-fetch hint instructions to the processor 10 and 
insert corresponding break instructions. Conversely, 

10 if the micro-architecture of the processor 10 

includes a relatively small on-chip cache then the 
object code adapter 14 may provide relatively few 
pre-fetch hint instructions to the processor 10 
because excessive pre-fetches would be more likely to 

15 cause undesirable evictions from the on-chip cache. 
In addition, if the micro-architecture of the 
processor 10 yields a relatively long latency on load 
memory instructions that miss a cache then the object 
code adapter 14 may provide a relatively large number 

20 of pre-fetch hint instructions to the processor 10. 

In another example, if the micro-architecture of 
the processor 10 includes a relatively sophisticated 
on-chip branch prediction capability then the object 

25 code adapter 14 may not provide branch predictions to 
the processor 10. Conversely, if the micro- 
architecture of the processor 10 includes little or 
no branch prediction capability then the object code 
adapter 14 may liberally provide branch predictions 

30 to the processor 10. The object code adapter 14 may 
take into account the length of an instruction 
pipeline in the processor 10 when providing branch 
predictions because a relatively longer pipeline 
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would cause a relatively 
mis -predict ion . 



large 



penalty on a branch 



The object code adapter 14 may be implemented in 
5 software or hardware or a combination of 

hardware/software. In one embodiment, the object 
code adapter 14 examines a sliding window of 
sequential instructions in the object code 60 when 
determining hint instructions. 

10 

The break mechanism of the processor 10 may be 
implemented in a wide variety of ways. For example, 
the processor 10 may be designed to branch to a 
predetermined address when a break instruction is 

15 executed and the object code adapter 14 stores the 

hint code 64 at that predetermined address. In some 
implementations, the predetermined address for code 
to be executed on a break instruction may be 
alterable using an internal register in the processor 

20 10 possibly using a special instruction for the 

processor 10. The break mechanism of the processor 
10 may rely on a branch instruction in the hint code 
64 to resume normal execution or may include an 
internal mechanism for resuming normal execution. 

25 

In an alternative embodiment, the computer 
system 200 performs break operations at specified 
time intervals. For example, the object code adapter 
14 may insert break instructions at predetermined 
30 time intervals or the processor 10 may break at 

predetermined time intervals. The breaks cause the 
processor 10 to branch to code that selects hint 
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Figure 2 illustrates a computer system 100 which 
5 includes an object code adapter 15 that provides hint 
instructions to a processor 11 according to the 
present teachings. The object code adapter 15 
provides hint instructions to the processor 11 using 
a mechanism for handling break instructions which is 
10 built in to the processor 10 and a hint register 12 
contained in the processor 10. 

In one embodiment, the object code adapter 15 
uses the present techniques to adapt a set of object 
15 code represented as an instruction stream 16 to a 
micro-architecture of the processor 11. The 
instruction stream 16 includes a sequence of 
instructions I 1 through I n in object code according to 
the macro-architecture of the processor 11. 

20 

The object code adapter 15 generates an 
instruction stream 18 for execution by the processor 
11 by inserting a set of break instructions B 1 through 
B x into the instruction stream 16 in place of selected 
25 instructions. For example, the break instruction B 1 
replaces the instruction I 2 . The break instructions 
B-l through B x cause the processor 11 to obtain and 
execute hint instructions which are provided via the 
hint register 12 in the processor 11. 

30 

In one embodiment, the hint register 12 holds a 
set of parameters including a hint instruction (H x ) , 
an instruction (I x ), an address ( P x+1 (address )) . The 
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hint instruction H x is an instruction to be executed 
by the processor 11 in response to a next break 
instruction in the instruction stream 18. The 
instruction I x is the instruction in the instruction 
5 stream 16 that was replaced by the break instruction 
to which the hint instruction H x corresponds. The 
address P x+1 ( address ) is an address from which to 
obtain a next set of parameters P x+1 to be loaded into 
the hint register 12. 

10 

The computer system 100 includes a memory 20 and 
a cache 22 which may hold a hint table of parameters 
P x to be loaded into the hint register 12. Table 1 
shows an example hint table. 
15 Table 1 



Table 
Address 


H x 


Ix 


H x+1 (address) 


address 1 


pre-fetch AO 


ADD R0, 1 


address 2 


address 2 


branch 

prediction Tl 


MOV R3, R4 


address 3 


address 3 


pre-fetch Al 


LD Rl 


address_4 



In one embodiment, the object code adapter 15 
examines a sliding window of sequential instructions 
in the instruction stream 16 when determining hint 

25 instructions. For example, at a particular point in 
time the object code adapter 15 may examine the 
instructions I x through I 10 and determine a hint 
instruction for one of the instructions I 3 through I 10 
and insert the break instruction B 1 in place of the 

30 instruction I 2 . 

In some embodiments, the processor 11 includes 
multiple hint registers which may be used to provide 
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hint instructions 
techniques . 



according to 



the present 



Figure 3 shows the handling of a break 
5 instruction by the processor 11. At step 120, the 
processor 11 obtains a hint instruction H x from the 
hint register 12 and inserts the hint instruction H x 
into the instruction stream to be executed. At step 
122, the processor 11 obtains the replaced 

10 instruction I x from the hint register 12 and inserts 
it into the instruction stream to be executed. At 
step 124, the processor 11 obtains a next set of hint 
parameters P x+1 using the address P x+1 (address) 
contained in the hint register 12 and loads the next 

15 set of hint parameters P x+1 into the hint register 12. 

Figure 4 shows a method for adapting the 
instruction stream 16 to the micro-architecture of 
20 the processor 11 according to the present teachings. 

At step 110, the object code adapter 15 examines the 
instruction stream 16 and determines a hint 
instruction based on the instruction stream 16 and 
the micro-architecture of the processor 11. 

25 

For example, the object code adapter 15 may 
detect a branch instruction in the instruction stream 
16 at step 110. In addition, the micro-architecture 
of the processor 11 may include no branch prediction 
30 capability. In response, the hint instruction 

determined at step 110 may be a branch prediction for 
the branch instruction detected at step 110. The 
object code adapter 15 may determine the branch 
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prediction in any known manner using optimizations 
that may be performed on the instruction stream 16 at 
run-time . 

5 In another example, the object code adapter 15 

may detect a load memory instruction in the 
instruction stream 16 at step 110 wherein the data 
for the load memory instruction is not available in 
the cache 22. In addition, the micro-architecture of 
10 the processor 11 may include a relatively large on- 
chip cache. In response, the hint instruction 
determined at step 110 may be a pre-fetch instruction 
having the memory address of the load memory 
instruction detected at step 110. 

15 

At step 112, the object code adapter 15 inserts 
a break instruction into the instruction stream 18 at 
a point where the hint instruction determined at step 
110 is to be executed. For example, if the 

20 instruction I n is a load memory instruction then the 
object code adapter 15 may insert the break 
instruction B n far enough ahead of the load memory 
instruction I n so that the pre-fetch operation 
executed when the break instruction B n is encountered 

25 by the processor 11 will be completed by the time the 
processor 11 executes the load memory instruction I n . 
In another example, if the instruction I n is a branch 
instruction then the object code adapter 15 may 
insert the break instruction B n several cycles ahead 

30 of the branch instruction I n to provide the processor 
11 with the corresponding branch prediction hint. 
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At step 114, the object code adapter 15 sets up 
the hint parameters P x consisting of the hint 
instruction H x determined at step 110, the instruction 
I x from the instruction stream 16 that was replaced by 
5 the hint instruction H x at step 110 when constructing 
the instruction stream 18, and an address 
P x+1 (address ) for a next set of hint parameters P x+1 . 
The hint parameters P x may be written into the memory 
20 at an address pointed to by the current 

10 P x+1 (address) value in the hint register 12. The hint 
parameters P x set up at step 114 will be loaded into 
the hint register 12 on the break instruction that 
occurs in the instruction stream 18 before the break 
instruction inserted at step 112. For example, if 

15 the break instruction B n is inserted at step 112 then 
the hint parameters P x set up at step 114 will be 
loaded into the hint register 12 when the processor 
11 encounters the break instruction B n _ 2 . 

20 The contents of the hint register 12 may be 

initialized by the object code adapter 15. For 
example, the processor 11 may be implemented with an 
instruction for loading the hint register 12 and the 
object code adapter 15 may insert that instruction 

25 with appropriate parameters into the instruction 
stream 18 before inserting break instructions. 

Figure 5 shows an example micro-architecture for 
the processor 11. The processor 11 in this 
30 embodiment includes an instruction pipeline 40 and a 
set of functional units 30-38 which perform hardware 
operations associated with instruction execution. 
For example, the decode unit 30 performs instruction 
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decode operations, the register unit 32 performs 
register operations and includes a set of registers 
including the hint register 12, and the memory unit 
38 performs load memory an pre-fetch operations. The 
5 branch unit 34 determines updated instruction 
pointers by resolving branch instructions. 

The instruction pipeline 40 holds the 
instructions I 3 through I 8 in corresponding stages of 

10 instruction execution. The processor 11 replaces the 
break instruction B 2 which was received in the 
instruction stream 18 with the instruction I 9 obtained 
from the hint register 12. The instruction I 9 was 
stored in the hint register 12 as the replaced 

15 instruction I x . In this embodiment, the decode unit 
30 detects the break instruction B 2 and obtains the 
instruction I 9 from the hint register and places it in 
the first stage of the pipeline 40. 

20 Figure 6 shows the instruction pipeline 40 one 

processor cycle after the break instruction B 2 is 
detected. The break instruction B 2 is replaced by the 
hint instruction H 2 obtained from the hint register 12 
and a load memory instruction LD 2 having a memory 

25 address obtained from the hint register 12 is 

inserted into the instruction pipeline 40 to read the 
next hint instruction/address pair for the hint 
register 12. The load memory operation LD 2 is 
performed by the memory unit 38. If the hint 

30 instruction H 2 is a pre-fetch operation then it is 
performed by the memory unit 38. If the hint 
instruction H 2 is a branch prediction then it is used 
by the branch unit 34 when generating an updated 
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instruction pointer for the subsequent branch 
instruction in the instruction stream 18. 



In another alternative embodiment, two special 
5 types of break instructions are used to provide hints 
to the processor 11 and the hint register 12 is used 
to hold target addresses for instruction execution. 
When a first special type of break instruction is 
encountered, the processor 11 branches to a target 

10 address specified in the hint register 12 (which 

points to hint code set up by the object code adapter 
15) and inserts the address of the instruction that 
caused the break into the hint register 12. The 
processor 11 then executes the desired hint 

15 instructions included in the hint code. The last 

instruction in the hint code is a second special type 
of break instruction which causes an address for a 
next set of hint code to be loaded into the hint 
register 12. 

20 

In yet another alternative embodiment, the hint 
register 12 holds three addresses including the 
address of a first set of hint code and the address 
of the first break instruction in an instruction 

25 stream to be executed. When a break instruction is 

encountered, the processor 11 branches to the address 
of the first set of hint code contained in the hint 
register 12 and moves the address of the break 
instruction into the first position in the hint 

30 register 12. The hint code when executed then 

inserts the addresses of a next set of hint code and 
a next break instruction into the second and third 
positions of the hint register 12. When a special 
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break instruction is encountered in the first set of 
hint code, the processor 11 moves the second two 
addresses in the hint register 12 to the first two 
positions in the hint register 12 in preparation for 
5 the next break instruction. 

The foregoing detailed description of the 
present invention is provided for the purposes of 
illustration and is not intended to be exhaustive or 
10 to limit the invention to the precise embodiment 
disclosed. Accordingly, the scope of the present 
invention is defined by the appended claims. 
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